Intro

This is a first step into using twitter to measure US population polarity toward politicians personalities.

To start working with the twitter API, some deep learning model like BERT and some cool data visualization in the aforementioned context, I decided to search from twitter, tweets having the handle @POTUS, everyday for a week, spanning as much US cities and states as possible. Once those tweets were collected, I ran a home trained fine tuned BERT sentiment classifier (positive, negative, neutral) to embbed those tweets in a sentiment space.

I wanted to see if states that were very similarly or oppositly polarized in their way of voting at the US election were also showing in average similar or opposite temporal sentiments trend.

Evolution of the polarity in presidential election over time and states

Only plotted for two states but the code allow you to plot all

USA map colored by weighted average polarity since 2000

Overview of data collected

Output of deep learning model

Evolution of the polarity in presidential election over time and states

Here polarity is defined as : $\frac{|n_{vote}(Democrate)-n_{vote}(Republican)|}{n_{vote}(Democrate)+n_{vote}(Republican)}$

Back to intro

USA map colored by weighted average polarity since 2000

The weighted average is made on the total number of voters per year per states.

Back to intro

A little of data handling : not interesting

Overview of data collected

I have written a script that daily requests tweets from 2 days ago (to be sure that each states have been through the whole day), having @POTUS in their text and that for 7768 cities in the US (20km radius around defining GPS coordinate). At the end of this 8 days of scrapping twiters, I was able to find only 1323 of those cities : which translates as follow in terms os sampling states. Quite uneven sampling.

So there is a big difference in sampling according to states and days of the week compared to week end.

I get rid of the states that don't have at least a representation of 50 tweets per day

I get rid of the first day

Back to intro

Output of deep learning model

The deep learning model gives you for each sentence its representation on space described by 3 axis : positive, negative, neutral. Every sentences is a normed vector in this space.

To describe the emotion per state I just average all the tweets from the same state, from the same day. I am thus able to reconstruct a time serie of tweeter emotion per states.

PCA: all the points

As expected since one dimension is constrained by the sum being one, the actual dimensionality is 2. Nicely, positive axis is mainly following first axis. Nicely enough the second axis diferentiates negative and neutral.

No pattern noticeable

Back to intro

Time series projected on the full 2D PCA

We saw above that those PCA axis were more or less explained by emotions and this is just a matrix representation of it. I have it there so it is easier to go back to when trying to make sense of below graph.

In this visualization you can see some anti correlation!!

Back to intro

Distance correlation : between time series of tweets emotions (not significant)

Distance correlation here is used to check time series correlation in multidimensional space. I understand most of it : except maybe why the unbiased estimator can be negative...

Unfortunately non of those distance correlations are significant so everything below is not significant. Still conceptually it is something we could think of for further studies.

Back to intro

Hierarchical clustering of the distance correlations : for fun since it is not significant

Back to intro

Multidimensional scaling using the distance correlation as similarities : for fun too

Back to intro